The Pareto Regret Frontier for Bandits
نویسنده
چکیده
Given a multi-armed bandit problem it may be desirable to achieve a smallerthan-usual worst-case regret for some special actions. I show that the price for such unbalanced worst-case regret guarantees is rather high. Specifically, if an algorithm enjoys a worst-case regret of B with respect to some action, then there must exist another action for which the worst-case regret is at least Ω(nK/B), where n is the horizon and K the number of actions. I also give upper bounds in both the stochastic and adversarial settings showing that this result cannot be improved. For the stochastic case the pareto regret frontier is characterised exactly up to constant factors.
منابع مشابه
Knowledge Gradient for Multi-objective Multi-armed Bandit Algorithms
We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to fin...
متن کاملFighting Bandits with a New Kind of Smoothness
We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regularization via the Tsallis entropy, which includes EXP3 as a special case, matches the O( √ NT ) minimax regret with a smaller constant factor. Second, we show that a wide ...
متن کاملThe Pareto Regret Frontier
Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more comple...
متن کاملRobustness in portfolio optimization based on minimax regret approach
Portfolio optimization is one of the most important issues for effective and economic investment. There is plenty of research in the literature addressing this issue. Most of these pieces of research attempt to make the Markowitz’s primary portfolio selection model more realistic or seek to solve the model for obtaining fairly optimum portfolios. An efficient frontier in the ...
متن کاملAn algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is O ( K √ n log n ) and against stochastic bandits the pseudo-regret is O ( ∑ i(log n)/∆i). We also show that no algorithm with O (log n) pseudo-regret against stochastic bandits can achieve Õ ( √ n) expected regret against adaptive...
متن کامل